Exploring the association of ESR1 and ESR2 gene SNPs with polycystic ovary syndrome in human females: a comprehensive association study

Background Polycystic Ovary Syndrome (PCOS) affects a significant proportion of human females worldwide and is characterized by hormonal, metabolic, and reproductive dysfunctions, including infertility, irregular menstrual cycles, acanthosis nigricans, and hirsutism. Mutations in the estrogen receptor genes ESR1 and ESR2, involved in normal follicular development and ovulation, can contribute to development of the PCOS. The present study focuses on investigating the potential correlation between single nucleotide polymorphisms (SNPs) of ESR1 and ESR2 genes and the incidence of this syndrome. Methods For this study, SNPs in ESR1 and ESR2 genes were retrieved from the ENSEMBL database and analyzed for their effect on mutated proteins using different bioinformatics tools including SIFT, PolyPhen, CADD, REVEL, MetaLR, I-Mutant, CELLO2GO, ProtParam, SOPMA, SWISS-MODEL and HDDOCK. Results All the SNPs documented in the present study were deleterious. All the SNPs except rs1583384537, rs1450198518, and rs78255744 decreased protein stability. Two variants rs1463893698 and rs766843910 in the ESR2 gene altered the localization of mutated proteins i.e. in addition to the nucleus, proteins were also found in mitochondria and extracellular, respectively. SNPs rs104893956 in ESR1 and rs140630557, rs140630557, rs1596423459, rs766843910, rs1596405923, rs762454979 and rs1384121511 in ESR2 gene significantly changed the secondary structure of proteins (2D). SNPs that markedly changed 3D configuration included rs1554259481, rs188957694 and rs755667747 in ESR1 gene and rs1463893698, rs140630557, rs1596423459, rs766843910, rs1596405923, rs762454979 and rs1384121511 in ESR2 gene. Variants rs1467954450 (ESR1) and rs140630557 (ESR2) were identified to reduce the binding tendency of ESRα and β receptors with estradiol as reflected by the docking scores i.e. -164.97 and -173.23, respectively. Conclusion Due to the significant impact on the encoded proteins, these variants might be proposed as biomarkers to predict the likelihood of developing PCOS in the future and for diagnostic purposes. Supplementary Information The online version contains supplementary material available at 10.1186/s13048-023-01335-7.


Background
Polycystic Ovary Syndrome (PCOS) is a complicated disorder affecting the female reproductive system.It is characterized by the presence of multiple egg-containing collagen-filled follicles that are arrested during growth, a thick ovarian capsule termed the tunica albuginea, central adiposity, obesity, high luteinizing hormone serum levels, hirsutism, and acne caused by the presence of excess androgens.One of the main concerns for most human females is the infertility associated with PCOS due to anovulatory or oligoovulatory infertility and menstrual irregularities.Even after conception many human females are more prone to miscarriage due to this condition [1][2][3].Systemic complications include a higher risk of cardiovascular disease, dyslipidemia and hypertension, insulin resistance and a four-fold increased risk of developing Type 2 diabetes.PCOS is also associated with obstructive sleep apnea, endometrial cancer, depression and lipid abnormalities [4][5][6].
The risk factors linked with the pathophysiology of PCOS can be categorized into three groups, namely genetic, non-genetic, and hormonal.The PCOS associated genes can be classified into six categories: genes associated with adrenal and ovarian steroidogenesis, including cytochrome P450, family 11, sub-family A, member 1 (CYP11A1), cytochrome P450, family 17 (CYP17), cytochrome P450, family 19 (CYP19) and cytochrome P450, family 21 (CYP21); genes associated with the effects of steroid hormones such as androgen receptor gene (AR) and sex hormone binding globular protein (SHBG); genes regulating gonadotropin activity and regulation such as luteinizing hormone (LH), follicle stimulating hormone receptor (FSHR) and anti-mullerian hormone (AMH); genes linked to the activity and release of insulin such as calpain 10 (CAPN10), insulin receptor substrate-1 (IRS-1), insulin receptor substrate-2 (IRS-2) and insulin (INS) etc. [3,[7][8][9][10].Additionally Environmental factors that contribute to PCOS such as environmental endocrine disruptors (EEDs), obesity, and diet, have not been extensively documented [11].In addition, hormonal factors such as the presence of hyperandrogenism and insulin resistance are also considered to contribute towards PCOS [12,13].
The ESR1 and ESR2 genes comprise of 8 and 9 exons, respectively.Their chromosomal locations are 6q25.1 and 14q23.1,respectively [14].ESR1 and ESR2 genes encode for ESRα and ESRβ proteins which function as binding sites for estradiol during the process of follicle development and ovulation.Estradiol is a determinant of follicle quality [15].ESRα expression occurs in interstitial, thecal and granulosa cells (GC) of developing antral follicles while ESRβ is only expressed in GC part of follicle.Both these receptor proteins have spliced isoform i.e. three and four for ESRα (ESRα66, ESRα46 and ESRα36) and ESRβ (ESRβ1 to ESRβ5), respectively [16,17].Under normal conditions, estradiol binds with ESRα and β receptors and acts synergistically with follicle stimulating hormone (FSH) to upregulate the expression of steroidgenic hormones including LHR receptor resulting in prominent preovulatory follicle selection and ovulation [18].Hence, estradiol plays vital role in normal follicle development which is strongly related with ESRα and ESRβ proteins expression [19].These two receptors play a vital role in regulating estrogen functions and dimerize to control the transcription of several downstream genes involved in physiological ovarian functions [17,20,21].ESRs expression has been found to be increased in epithelium and stroma during proliferative stage of female reproductive cycle [22,23].
Literature also supports the link between ESRs polymorphisms and PCOS like single nucleotide variants of ESRα rs9340799 and rs1999805 have been found to be associated with PCOS patients of Pakistan and China, respectively [24,25].An ESRα associated polymorphism rs2234693 has been reported to be found more frequently in PCOS patients than in normal controls [26].Connection between ESR1 and 2 genes mutations and PCOS is explained in Fig. 1.
Keeping in view the above mentioned association of ESRα and β receptors with PCOS clinical manifestations, these receptor proteins might be considered as most significant markers of PCOS.Any mutations in the ESRα and β encoding genes may disrupt the normal development of follicles, resulting in polycystic ovaries.The present study aimed to identify SNPs in ESR1 and 2 genes that may lead to abnormal development of ESR α and β receptors.These SNPs could potentially serve as biomarkers for the prognosis and diagnosis of PCOS.

Retrieving coding sequences of genes and reported SNPs from ENSEMBL
To retrieve the coding sequences of ESR1 (ENST00000206249.8) and ESR2 (ENST00000341099.6)genes and the SNPs, ENSEMBL database (https:// asia.ensem bl.org/ index.html, accessed on 1 September 2022) was explored.SNPs were incorporated in normal CDS of genes to generate mutated sequences.The coding sequences of normal genes are shown (Supplementary data Table 1).Stop gained, missense and frame shift mutations retrieved and addressed in present study are described in detail (Table 1).

Translation of normal and mutated genes sequences
ExPaSy tool (https:// web.expasy.org/ trans late/, accessed on 1 September 2022) was employed to convert nucleotide sequences of ESR1 and ESR2 genes into amino acid sequences [27].

Effect of missense SNPs on stability of mutated proteins
To determine the effect of missense SNPs on stability of mutated proteins, I-Mutant suit (https:// foldi ng.biofo ld.org/i-mutant/ i-mutan t2.0.html, accessed on 7 September 2022) was used.Gibbs free energy (DGG) and reliability index (RI) were measured.Value of DGG above zero shows increase while below zero represents decrease in stability of mutated proteins [33].

Physicochemical properties prediction
Physicochemical properties like number of amino acids, molecular weight, isoelectric point (pI), half-life, extinction coefficient, aliphatic index, instability index and grand average of hydropathicity (GRAVY) were determined for the normal as well as mutated proteins using ProtParam tool (https:// web.expasy.org/ protp aram/, accessed on 7 September 2022).

Secondary structure prediction
To predict the effect of SNPs on 2D of mutated protein, SOPMA secondary structure predicted method (https:// npsa-prabi.ibcp.fr/ cgi-bin/ npsa_ autom at.pl? page=/ NPSA/ npsa_ sopma.html, accessed on 7 September 2022) was employed.Using this method, percentage of alpha helix, extended strand, beta turn and  [17] random coil was determined for all the SNPs containing amino acid sequences and results were compared with the normal protein [35].Disordered residues were assessed in normal and mutated sequences using protein disorder prediction server i.e.PrDOS (https:// prdos.hgc.jp/ cgi-bin/ result.cgi? ppid= 37623 1p1d1 66270 6327, accessed on 13 September 2022).This tool helped to predict the effect of SNPs on number of disordered regions and number of residues exhibiting disorder.The results were obtained in the form of disorder profile plot and sequence of proteins with disordered regions highlighted in red [36].

Three dimensional structure prediction and validation
To detect the effect of SNPs on 3D configuration of proteins, homology modelling server SWISS-MODEL (https:// swiss model.expasy.org, accessed on August 2023) was used.The pdb structure derived using this tool was validated through ERRAT and PROCHECK (https:// saves.mbi.ucla.edu/ resul ts? job= 10728 13&p= errat, accessed on 26 August 2023) tools [37].In ERRAT validation, the quality was calculated for pdb structures of normal and mutant proteins [38].On the other hand, in PROCHECK validation, G-value and Ramachandran plots were predicted for the normal and mutant cases.Ramachandran plots helped us in assessment of amino acids in most favored, additional allowed, generously allowed and disallowed regions [39].

Docking analysis
To analyze the effect of SNPs on interaction of mutated forms of ESR1 and ESR2 with native form of estradiol hormone, docking was performed using web server for protein-protein docking i.e.HDOCK server (hdock.phys.hust.edu.cn,accessed on August 2023).The estradiol pdb structure was uploaded as input receptor molecule while the mutated structures of ESR1 and ESR2 were uploaded as ligands.Docking score, confidence score and ligand RMSD values were recorded.

Deleteriousness of SNPs
The present study has documented a total of fifteen SNPs (four stop-gained and eleven missense) of the ESR1 gene and fifteen SNPs (six stop-gained and nine missense) of the ESR2 gene.Deleteriousness analysis of the missense SNPs showed the pathogenicity of all the missense mutations addressed in present study (Table 1).

Analysis of SNPs effect on stability of mutated proteins
To determine the effect of SNPs on stability of mutated proteins, I-Mutant tool was used.It was found that in case of ESR1 gene, two SNPs i.e. rs1583384537 and rs1584799119 increased stability of mutated proteins as compared to remaining polymorphisms which had decreasing effect.On the other hand, in case of ESR2 gene, two out of six missense mutations i.e. rs1450198518 and rs78255744 showed increasing effect while remaining four SNPs reduced the mutated proteins stability (Table 2).

Analysis of SNPs effect on sub-cellular localization of mutated proteins
The sub-cellular localization prediction revealed that in case of ESR1 gene, only one SNP rs1554259481 altered the localization of mutated protein from nuclear to nuclear and cytoplasmic (Supplementary data Fig.S1).
In case of ESR2 gene, the localization remained unaffected.While in case of ESR2 gene, two SNPs i.e. rs1463893698 and rs766843910 changed the localization of mutated proteins from normal location (nuclear) to nuclear: mitochondrial and nuclear: extracellular, respectively (Table 4, Supplementary data Fig.S2).

Analysis of SNPs effect on secondary structures of mutated proteins
Effect of SNPs on different aspects of 2D structure of mutated proteins i.e. alpha helix, extended strand, beta turn, random coil, number of disordered regions  and S4).
In case of ESR2 gene, all the mutations were observed to change the secondary structure of mutated proteins.The highest and lowest alterations in normal values were observed to be 25.15% (rs762454979) and 0% (rs1463893698), respectively in case of alpha helix, 16.19% (rs1463893698) and 7.17% (rs78255744), respectively in case of extended strand, 4.24% (rs766843910) and 1.52% (rs140630557), respectively in case of beta turn and 80.28% (rs1463893698) and 51.05% (rs1257844897), respectively in case of random coil.
The highest and lowest deviations from the normal values of number of disordered regions (177) and number of amino acids in disordered region (7) were observed to be 189 and 7 in case of SNP rs1414263985 and 28 and 2 in case of SNP rs1463893698, respectively (Supplementary data Figs.S5 and S6).

Analysis of SNPs effect on 3D structure of mutated proteins
The SWISSMODEL based analysis of mutated proteins of ESRα gene revealed SNPs rs1554259481, rs188957694 and rs755667747 drastically altered the 3D configuration of mutated proteins.While SNPs rs1583384537 and rs1131692059 caused slight change in structure.However, all other mutations addressed in present study did not affect the overall configuration of mutated proteins (Fig. 2).
In case of ESRβ gene, mutations rs1463893698, rs140630557, rs1596423459, rs766843910, rs1596405923, rs762454979 and rs1384121511 induced considerable change in mutated proteins.Slight change in 3D structure has been observed in case of SNP rs1257844897.While in all other cases, no effect was observed on tertiary structure of proteins (Fig. 3).
The pdb structures of normal and mutated proteins generated using PHYRE2 tool were subjected to ERRAT and Ramachandran plot analysis.In case of ESRα gene, the values for all the structures including normal and mutated proteins, the ERRAT scores were above 70% and hence, are considered of good quality (Supplementary data Fig.S7).However, in cases of two stop gained SNPs i.e. rs1554259481 and rs104893956 which caused formation of truncated proteins, the overall quality factor was below 50% while in cases of SNPs rs755667747 and rs1467954450, the quality factor was below 70% but above 50%.In ESRβ gene encoded normal and mutated proteins, in all the cases the overall quality factor value was above 70% except rs1596423459, rs140630557, rs766843910 and rs1596405923.For the mutated structure induced by stop-gained mutation rs140630557, no quality score was predicted by ERRAT (Supplementary data Fig.S8).As far as the validation of protein structures using Ramachandran plots is concerned, the residues found in most favored regions were above or closer to 90% and hence qualified the good quality score.However, in cases of ESRα gene SNPs rs1554259481 and rs104893956 and ESRβ gene SNP rs140630557, the scores were in the range of 70% (Supplementary data Figs.S9 and S10).The G-values were closer to zero in all the cases further validating the protein structures.

Discussion
There have been several studies showing the correlation between SNPs of ESR1 and ESR2 genes with PCOS.These studies were based on comparison of genes between the healthy and the diseased individuals using experimental methods including biochemical and hormonal analysis,  restriction fragment length polymerase chain reaction (RFLP-PCR), real-time polymerase chain reaction (RT-PCR) and Sanger sequencing [24,25,40,41].
A study focusing at finding ESR1 and ESR2 gene markers associated with PCOS in Tunisian human females revealed strong association of SNVs rs2234693 and rs3798577 in ESRα gene and rs1256049 in ESRβ gene with this disease [40].Another study has reported the association of SNP rs1999805 in ESR1 gene with PCOS in Chinese population [25].In Pakistani human females from Punjab, three SNPs i.e. rs2234693, rs9340799 and rs8179176 in ESR1 gene and rs4986938 in ESR2 gene have been reported to be significantly associated with PCOS [24].Another research project investigated the correlation between the SNP rs4986938 of the ESR2 gene with PCOS in unmarried Iraqi human females.This study focused on the substitution of G allele with A allele, with the results indicating the A allele had a higher association with PCOS in comparison to the wild type G allele [42].
Most of the studies available in the literature have focused on biomarkers in intronic and untranslated regions (UTRs) while data about exonic region SNPs is considerably scarce.Moreover, there is lack of detailed information regarding the impact of these mutations on physicochemical characteristics, localization and 2D and 3D structures.The current study aimed to investigate the effects of ESR1 and ESR2 genes SNPs reported in Human Genome Project on the attributes of encoded proteins.The SNPs that were focused in present study have not been previously examined in literature.
The pI shows acidity or alkalinity of mutated proteins.In ESR1 gene, only the mutation rs104893956 enhanced acidity of mutated protein with pI = 4.73 and the mutations rs755667747, rs1467954450 and rs1253340312 had reducing effect on acidity.As far as the ESR2 gene is concerned, the SNP rs1463893698 induced acidity in mutated protein while the mutations rs766843910, rs1596405923, rs762454979 and rs1384121511 increased alkalinity of mutated proteins.
Aliphatic index is the measure of number of amino acids with aliphatic side chain in a protein which reflects protein thermostability over wide range of temperatures.If the value ranges between 66.5 to 84.33 then the protein is considered highly stable [43].In case of ESR1 gene, only two SNPs i.e. rs755667747 and rs1467954450 decreased thermostability of mutated proteins.The highest thermostability was observed in case of SNP rs1253340312.In case of ESR2 gene, two SNPs i.e. rs1596423459 and rs766843910 reduced the thermostability of mutated proteins while the highest thermostability was induced by mutations rs1257844897 and rs200502775.Instability index indicates stability of protein in test tube [44].Instability index below 40 reflects protein stability.Among the SNPs documented in ESR1 gene, only the mutation rs1554259481 has increasing effect on protein stability with instability index of 17.52.As far as the ESR2 gene is concerned, all the mutated proteins just like the normal one were found unstable with instability index above 40.Two SNPs rs1463893698 and rs140630557 reduced stability of mutated proteins to greater extent.GRAVY indicates the polarity level of proteins [45].In cases of both the ESR1 and ESR2 genes, the negative values of GRAVY indicate that all the mutated proteins are hydrophilic just like the normal one.
In ESR1 gene, three documented SNPs i.e. rs1554259481, rs104893956 and rs755667747 have been found to cause drastic change in physicochemical properties, 2D and 3D structures of mutated proteins.In ESR2 gene, the seven SNPs i.e. rs1463893698, rs140630557, rs1596405923, rs1596423459, rs762454979, rs1384121511 and rs766843910 caused significant alterations in physicochemical properties, 2D and 3D structures of mutant proteins.So, these SNPs can be used as susceptibility markers for PCOS.
Binding of E2 with ESR1 and ESR2 receptors is a crucial event in biological actions of this hormone [46].Any mutation altering the different attributes of ESR1 and ESR2 receptors might lead to disturbance of the binding tendency of E2 with these receptor proteins and might have serious influence on body functions.Therefore, alteration in binding tendency of ESR1 and ESR2 proteins mutated forms with E2 has been observed in present study.Single nucleotide variants i.e. rs1467954450 (ESR1 receptor), rs1463893698 and rs140630557 (ESR2 receptor) markedly reduced this binding as revealed by their docking scores.Although literature reports the association of ESR1 gene heterozygous mutation c. 619G > A/p.A207T with insensitivity of the encoded receptor towards E2 as well as the association of ESR genes polymorphisms with disturbances in E2 concentration in PCOS patients [21,47].However, this is the first ever study reporting the effect of rs1467954450 (ESR1 receptor), rs1463893698 and rs140630557 SNPs in ESRα and ESRβ genes on binding with E2.This reduced binding affinity might also contribute to disturbance in effective level of E2 hormone in the body.

Conclusion
The current in-silico study has demonstrated a strong association between ten SNPs present in ESR1 and ESR2 genes with PCOS.However, it is vital to conduct further research to assess the potential of these mutations as PCOS biomarkers.Identifying these SNPs could assist in predicting the likelihood of PCOS development in human females.These findings could also contribute to the development of targeted therapies for PCOS and help improve the understanding of the underlying molecular mechanisms involved in the pathogenesis of the disorder.

Fig. 2
Fig. 2 Effect of ESR1 gene SNPs documented in present study on 3D structure of mutated proteins predicted using SWISSMODEL

Fig. 3
Fig. 3 Effect of ESR2 gene SNPs documented in present study on 3D structure of mutated proteins predicted using SWISSMODE

Table 1
Single nucleotide polymorphisms (SNPs) in ESR1 and ESR2 genes associated with PCOS addressed in present study

Table 2
Effect of SNPs on stability of mutated proteins predicted using I-Mutant at 25°C and pH = 7 RI Reliability index, DDG Gibbs free energy value, calculated by formula DG (new protein) -DG (wild type)

Table 3
Prediction of effect of SNPs on physicochemical properties of mutated proteins

Table 4
Effect of SNPs on stability of mutated protein predicted using I-Mutant